Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 221
Filter
1.
Nat Methods ; 2024 May 14.
Article in English | MEDLINE | ID: mdl-38744917

ABSTRACT

AlphaFold2 revolutionized structural biology with the ability to predict protein structures with exceptionally high accuracy. Its implementation, however, lacks the code and data required to train new models. These are necessary to (1) tackle new tasks, like protein-ligand complex structure prediction, (2) investigate the process by which the model learns and (3) assess the model's capacity to generalize to unseen regions of fold space. Here we report OpenFold, a fast, memory efficient and trainable implementation of AlphaFold2. We train OpenFold from scratch, matching the accuracy of AlphaFold2. Having established parity, we find that OpenFold is remarkably robust at generalizing even when the size and diversity of its training set is deliberately limited, including near-complete elisions of classes of secondary structure elements. By analyzing intermediate structures produced during training, we also gain insights into the hierarchical manner in which OpenFold learns to fold. In sum, our studies demonstrate the power and utility of OpenFold, which we believe will prove to be a crucial resource for the protein modeling community.

2.
bioRxiv ; 2024 Apr 26.
Article in English | MEDLINE | ID: mdl-38712088

ABSTRACT

Tissue structure and molecular circuitry in the colon can be profoundly impacted by systemic age-related effects, but many of the underlying molecular cues remain unclear. Here, we built a cellular and spatial atlas of the colon across three anatomical regions and 11 age groups, encompassing ~1,500 mouse gut tissues profiled by spatial transcriptomics and ~400,000 single nucleus RNA-seq profiles. We developed a new computational framework, cSplotch, which learns a hierarchical Bayesian model of spatially resolved cellular expression associated with age, tissue region, and sex, by leveraging histological features to share information across tissue samples and data modalities. Using this model, we identified cellular and molecular gradients along the adult colonic tract and across the main crypt axis, and multicellular programs associated with aging in the large intestine. Our multi-modal framework for the investigation of cell and tissue organization can aid in the understanding of cellular roles in tissue-level pathology.

4.
Genome Biol ; 25(1): 88, 2024 Apr 08.
Article in English | MEDLINE | ID: mdl-38589899

ABSTRACT

Inferring gene regulatory networks (GRNs) from single-cell data is challenging due to heuristic limitations. Existing methods also lack estimates of uncertainty. Here we present Probabilistic Matrix Factorization for Gene Regulatory Network Inference (PMF-GRN). Using single-cell expression data, PMF-GRN infers latent factors capturing transcription factor activity and regulatory relationships. Using variational inference allows hyperparameter search for principled model selection and direct comparison to other generative models. We extensively test and benchmark our method using real single-cell datasets and synthetic data. We show that PMF-GRN infers GRNs more accurately than current state-of-the-art single-cell GRN inference methods, offering well-calibrated uncertainty estimates.


Subject(s)
Algorithms , Gene Regulatory Networks
5.
bioRxiv ; 2024 Feb 29.
Article in English | MEDLINE | ID: mdl-38464323

ABSTRACT

Microbiome studies have revealed gut microbiota's potential impact on complex diseases. However, many studies often focus on one disease per cohort. We developed a meta-analysis workflow for gut microbiome profiles and analyzed shotgun metagenomic data covering 11 diseases. Using interpretable machine learning and differential abundance analysis, our findings reinforce the generalization of binary classifiers for Crohn's disease (CD) and colorectal cancer (CRC) to hold-out cohorts and highlight the key microbes driving these classifications. We identified high microbial similarity in disease pairs like CD vs ulcerative colitis (UC), CD vs CRC, Parkinson's disease vs type 2 diabetes (T2D), and schizophrenia vs T2D. We also found strong inverse correlations in Alzheimer's disease vs CD and UC. These findings detected by our pipeline provide valuable insights into these diseases.

6.
ChemistryOpen ; : e202300263, 2024 Mar 01.
Article in English | MEDLINE | ID: mdl-38426687

ABSTRACT

Organophosphates (OPs) are a class of neurotoxic acetylcholinesterase inhibitors including widely used pesticides as well as nerve agents such as VX and VR. Current treatment of these toxins relies on reactivating acetylcholinesterase, which remains ineffective. Enzymatic scavengers are of interest for their ability to degrade OPs systemically before they reach their target. Here we describe a library of computationally designed variants of phosphotriesterase (PTE), an enzyme that is known to break down OPs. The mutations G208D, F104A, K77A, A80V, H254G, and I274N broadly improve catalytic efficiency of VX and VR hydrolysis without impacting the structure of the enzyme. The mutation I106 A improves catalysis of VR and L271E abolishes activity, likely due to disruptions of PTE's structure. This study elucidates the importance of these residues and contributes to the design of enzymatic OP scavengers with improved efficiency.

7.
J Allergy Clin Immunol ; 153(4): 954-968, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38295882

ABSTRACT

Studies of asthma and allergy are generating increasing volumes of omics data for analysis and interpretation. The National Institute of Allergy and Infectious Diseases (NIAID) assembled a workshop comprising investigators studying asthma and allergic diseases using omics approaches, omics investigators from outside the field, and NIAID medical and scientific officers to discuss the following areas in asthma and allergy research: genomics, epigenomics, transcriptomics, microbiomics, metabolomics, proteomics, lipidomics, integrative omics, systems biology, and causal inference. Current states of the art, present challenges, novel and emerging strategies, and priorities for progress were presented and discussed for each area. This workshop report summarizes the major points and conclusions from this NIAID workshop. As a group, the investigators underscored the imperatives for rigorous analytic frameworks, integration of different omics data types, cross-disciplinary interaction, strategies for overcoming current limitations, and the overarching goal to improve scientific understanding and care of asthma and allergic diseases.


Subject(s)
Asthma , Hypersensitivity , United States , Humans , National Institute of Allergy and Infectious Diseases (U.S.) , Hypersensitivity/genetics , Asthma/etiology , Genomics , Proteomics , Metabolomics
8.
Genome Biol ; 25(1): 24, 2024 Jan 18.
Article in English | MEDLINE | ID: mdl-38238840

ABSTRACT

BACKGROUND: Modeling of gene regulatory networks (GRNs) is limited due to a lack of direct measurements of genome-wide transcription factor activity (TFA) making it difficult to separate covariance and regulatory interactions. Inference of regulatory interactions and TFA requires aggregation of complementary evidence. Estimating TFA explicitly is problematic as it disconnects GRN inference and TFA estimation and is unable to account for, for example, contextual transcription factor-transcription factor interactions, and other higher order features. Deep-learning offers a potential solution, as it can model complex interactions and higher-order latent features, although does not provide interpretable models and latent features. RESULTS: We propose a novel autoencoder-based framework, StrUcture Primed Inference of Regulation using latent Factor ACTivity (SupirFactor) for modeling, and a metric, explained relative variance (ERV), for interpretation of GRNs. We evaluate SupirFactor with ERV in a wide set of contexts. Compared to current state-of-the-art GRN inference methods, SupirFactor performs favorably. We evaluate latent feature activity as an estimate of TFA and biological function in S. cerevisiae as well as in peripheral blood mononuclear cells (PBMC). CONCLUSION: Here we present a framework for structure-primed inference and interpretation of GRNs, SupirFactor, demonstrating interpretability using ERV in multiple biological and experimental settings. SupirFactor enables TFA estimation and pathway analysis using latent factor activity, demonstrated here on two large-scale single-cell datasets, modeling S. cerevisiae and PBMC. We find that the SupirFactor model facilitates biological analysis acquiring novel functional and regulatory insight.


Subject(s)
Gene Regulatory Networks , Saccharomyces cerevisiae , Saccharomyces cerevisiae/genetics , Algorithms , Leukocytes, Mononuclear , Transcription Factors/genetics
9.
Biomacromolecules ; 25(1): 258-271, 2024 01 08.
Article in English | MEDLINE | ID: mdl-38110299

ABSTRACT

Protein hydrogels represent an important and growing biomaterial for a multitude of applications, including diagnostics and drug delivery. We have previously explored the ability to engineer the thermoresponsive supramolecular assembly of coiled-coil proteins into hydrogels with varying gelation properties, where we have defined important parameters in the coiled-coil hydrogel design. Using Rosetta energy scores and Poisson-Boltzmann electrostatic energies, we iterate a computational design strategy to predict the gelation of coiled-coil proteins while simultaneously exploring five new coiled-coil protein hydrogel sequences. Provided this library, we explore the impact of in silico energies on structure and gelation kinetics, where we also reveal a range of blue autofluorescence that enables hydrogel disassembly and recovery. As a result of this library, we identify the new coiled-coil hydrogel sequence, Q5, capable of gelation within 24 h at 4 °C, a more than 2-fold increase over that of our previous iteration Q2. The fast gelation time of Q5 enables the assessment of structural transition in real time using small-angle X-ray scattering (SAXS) that is correlated to coarse-grained and atomistic molecular dynamics simulations revealing the supramolecular assembling behavior of coiled-coils toward nanofiber assembly and gelation. This work represents the first system of hydrogels with predictable self-assembly, autofluorescent capability, and a molecular model of coiled-coil fiber formation.


Subject(s)
Molecular Dynamics Simulation , Proteins , Scattering, Small Angle , X-Ray Diffraction , Proteins/chemistry , Hydrogels
10.
ACS Appl Nano Mater ; 6(22): 21245-21257, 2023 Nov 24.
Article in English | MEDLINE | ID: mdl-38037605

ABSTRACT

Theranostic materials research is experiencing rapid growth driven by the interest in integrating both therapeutic and diagnostic modalities. These materials offer the unique capability to not only provide treatment but also track the progression of a disease. However, to create an ideal theranostic biomaterial without compromising drug encapsulation, diagnostic imaging must be optimized for improved sensitivity and spatial localization. Herein, we create a protein-engineered fluorinated coiled-coil fiber, Q2TFL, capable of improved sensitivity to 19F magnetic resonance spectroscopy (MRS) detection. Leveraging residue-specific noncanonical amino acid incorporation of trifluoroleucine (TFL) into the coiled-coil, Q2, which self-assembles into nanofibers, we generate Q2TFL. We demonstrate that fluorination results in a greater increase in thermostability and 19F magnetic resonance detection compared to the nonfluorinated parent, Q2. Q2TFL also exhibits linear ratiometric 19F MRS thermoresponsiveness, allowing it to act as a temperature probe. Furthermore, we explore the ability of Q2TFL to encapsulate the anti-inflammatory small molecule, curcumin (CCM), and its impact on the coiled-coil structure. Q2TFL also provides hyposignal contrast in 1H MRI, echogenic signal with high-frequency ultrasound and sensitive detection by 19F MRS in vivo illustrating fluorination of coiled-coils for supramolecular assembly and their use with 1H MRI, 19F MRS and high frequency ultrasound as multimodal theranostic agents.

11.
bioRxiv ; 2023 Nov 26.
Article in English | MEDLINE | ID: mdl-38045331

ABSTRACT

The sequence-structure-function relationships that ultimately generate the diversity of extant observed proteins is complex, as proteins bridge the gap between multiple informational and physical scales involved in nearly all cellular processes. One limitation of existing protein annotation databases such as UniProt is that less than 1% of proteins have experimentally verified functions, and computational methods are needed to fill in the missing information. Here, we demonstrate that a multi-aspect framework based on protein language models can learn sequence-structure-function representations of amino acid sequences, and can provide the foundation for sensitive sequence-structure-function aware protein sequence search and annotation. Based on this model, we introduce a multi-aspect information retrieval system for proteins, Protein-Vec, covering sequence, structure, and function aspects, that enables computational protein annotation and function prediction at tree-of-life scales.

12.
bioRxiv ; 2023 Sep 23.
Article in English | MEDLINE | ID: mdl-37790443

ABSTRACT

Cells respond to environmental and developmental stimuli by remodeling their transcriptomes through regulation of both mRNA transcription and mRNA decay. A central goal of biology is identifying the global set of regulatory relationships between factors that control mRNA production and degradation and their target transcripts and construct a predictive model of gene expression. Regulatory relationships are typically identified using transcriptome measurements and causal inference algorithms. RNA kinetic parameters are determined experimentally by employing run-on or metabolic labeling (e.g. 4-thiouracil) methods that allow transcription and decay rates to be separately measured. Here, we develop a deep learning model, trained with single-cell RNA-seq data, that both infers causal regulatory relationships and estimates RNA kinetic parameters. The resulting in silico model predicts future gene expression states and can be perturbed to simulate the effect of transcription factor changes. We acquired model training data by sequencing the transcriptomes of 175,000 individual Saccharomyces cerevisiae cells that were subject to an external perturbation and continuously sampled over a one hour period. The rate of change for each transcript was calculated on a per-cell basis to estimate RNA velocity. We then trained a deep learning model with transcriptome and RNA velocity data to calculate time-dependent estimates of mRNA production and decay rates. By separating RNA velocity into transcription and decay rates, we show that rapamycin treatment causes existing ribosomal protein transcripts to be rapidly destabilized, while production of new transcripts gradually slows over the course of an hour. The neural network framework we present is designed to explicitly model causal regulatory relationships between transcription factors and their genes, and shows superior performance to existing models on the basis of recovery of known regulatory relationships. We validated the predictive power of the model by perturbing transcription factors in silico and comparing transcriptome-wide effects with experimental data. Our study represents the first step in constructing a complete, predictive, biophysical model of gene expression regulation.

13.
Nat Biotechnol ; 2023 Sep 07.
Article in English | MEDLINE | ID: mdl-37679542

ABSTRACT

Exploiting sequence-structure-function relationships in biotechnology requires improved methods for aligning proteins that have low sequence similarity to previously annotated proteins. We develop two deep learning methods to address this gap, TM-Vec and DeepBLAST. TM-Vec allows searching for structure-structure similarities in large sequence databases. It is trained to accurately predict TM-scores as a metric of structural similarity directly from sequence pairs without the need for intermediate computation or solution of structures. Once structurally similar proteins have been identified, DeepBLAST can structurally align proteins using only sequence information by identifying structurally homologous regions between proteins. It outperforms traditional sequence alignment methods and performs similarly to structure-based alignment methods. We show the merits of TM-Vec and DeepBLAST on a variety of datasets, including better identification of remotely homologous proteins compared with state-of-the-art sequence alignment and structure prediction methods.

14.
ArXiv ; 2023 Aug 10.
Article in English | MEDLINE | ID: mdl-37608940

ABSTRACT

Multiple sequence alignments (MSAs) of proteins encode rich biological information and have been workhorses in bioinformatic methods for tasks like protein design and protein structure prediction for decades. Recent breakthroughs like AlphaFold2 that use transformers to attend directly over large quantities of raw MSAs have reaffirmed their importance. Generation of MSAs is highly computationally intensive, however, and no datasets comparable to those used to train AlphaFold2 have been made available to the research community, hindering progress in machine learning for proteins. To remedy this problem, we introduce OpenProteinSet, an open-source corpus of more than 16 million MSAs, associated structural homologs from the Protein Data Bank, and AlphaFold2 protein structure predictions. We have previously demonstrated the utility of OpenProteinSet by successfully retraining AlphaFold2 on it. We expect OpenProteinSet to be broadly useful as training and validation data for 1) diverse tasks focused on protein structure, function, and design and 2) large-scale multimodal machine learning research.

15.
Nat Neurosci ; 26(7): 1208-1217, 2023 07.
Article in English | MEDLINE | ID: mdl-37365313

ABSTRACT

Autism spectrum disorder (ASD) is a neurodevelopmental disorder characterized by heterogeneous cognitive, behavioral and communication impairments. Disruption of the gut-brain axis (GBA) has been implicated in ASD although with limited reproducibility across studies. In this study, we developed a Bayesian differential ranking algorithm to identify ASD-associated molecular and taxa profiles across 10 cross-sectional microbiome datasets and 15 other datasets, including dietary patterns, metabolomics, cytokine profiles and human brain gene expression profiles. We found a functional architecture along the GBA that correlates with heterogeneity of ASD phenotypes, and it is characterized by ASD-associated amino acid, carbohydrate and lipid profiles predominantly encoded by microbial species in the genera Prevotella, Bifidobacterium, Desulfovibrio and Bacteroides and correlates with brain gene expression changes, restrictive dietary patterns and pro-inflammatory cytokine profiles. The functional architecture revealed in age-matched and sex-matched cohorts is not present in sibling-matched cohorts. We also show a strong association between temporal changes in microbiome composition and ASD phenotypes. In summary, we propose a framework to leverage multi-omic datasets from well-defined cohorts and investigate how the GBA influences ASD.


Subject(s)
Autism Spectrum Disorder , Gastrointestinal Microbiome , Humans , Gastrointestinal Microbiome/genetics , Brain-Gut Axis , Autism Spectrum Disorder/genetics , Autism Spectrum Disorder/metabolism , Cross-Sectional Studies , Bayes Theorem , Reproducibility of Results , Cytokines
16.
Microbiome ; 11(1): 136, 2023 06 17.
Article in English | MEDLINE | ID: mdl-37330554

ABSTRACT

BACKGROUND: Disruption of the microbial community in the respiratory tract due to infections, like influenza, could impact transmission of bacterial pathogens. Using samples from a household study, we determined whether metagenomic-type analyses of the microbiome provide the resolution necessary to track transmission of airway bacteria. Microbiome studies have shown that the microbial community across various body sites tends to be more similar between individuals who cohabit in the same household than between individuals from different households. We tested whether there was increased sharing of bacteria from the airways within households with influenza infections as compared to control households with no influenza. RESULTS: We obtained 221 respiratory samples that were collected from 54 individuals at 4 to 5 time points across 10 households, with and without influenza infection, in Managua, Nicaragua. From these samples, we generated metagenomic (whole genome shotgun sequencing) datasets to profile microbial taxonomy. Overall, specific bacteria and phages were differentially abundant between influenza positive households and control (no influenza infection) households, with bacteria like Rothia, and phages like Staphylococcus P68virus that were significantly enriched in the influenza-positive households. We identified CRISPR spacers detected in the metagenomic sequence reads and used these to track bacteria transmission within and across households. We observed a clear sharing of bacterial commensals and pathobionts, such as Rothia, Neisseria, and Prevotella, within and between households. However, due to the relatively small number of households in our study, we could not determine if there was a correlation between increased bacterial transmission and influenza infection. CONCLUSION: We observed that airway microbial composition differences across households were associated with what appeared to be different susceptibility to influenza infection. We also demonstrate that CRISPR spacers from the whole microbial community can be used as markers to study bacterial transmission between individuals. Although additional evidence is needed to study transmission of specific bacterial strains, we observed sharing of respiratory commensals and pathobionts within and across households. Video Abstract.


Subject(s)
Influenza, Human , Microbiota , Micrococcaceae , Humans , Clustered Regularly Interspaced Short Palindromic Repeats , Influenza, Human/prevention & control , Bacteria , Metagenome/genetics , Microbiota/genetics , Micrococcaceae/genetics
17.
mSystems ; 8(2): e0117822, 2023 04 27.
Article in English | MEDLINE | ID: mdl-37010293

ABSTRACT

Comprehensive protein function annotation is essential for understanding microbiome-related disease mechanisms in the host organisms. However, a large portion of human gut microbial proteins lack functional annotation. Here, we have developed a new metagenome analysis workflow integrating de novo genome reconstruction, taxonomic profiling, and deep learning-based functional annotations from DeepFRI. This is the first approach to apply deep learning-based functional annotations in metagenomics. We validate DeepFRI functional annotations by comparing them to orthology-based annotations from eggNOG on a set of 1,070 infant metagenomes from the DIABIMMUNE cohort. Using this workflow, we generated a sequence catalogue of 1.9 million nonredundant microbial genes. The functional annotations revealed 70% concordance between Gene Ontology annotations predicted by DeepFRI and eggNOG. DeepFRI improved the annotation coverage, with 99% of the gene catalogue obtaining Gene Ontology molecular function annotations, although they are less specific than those from eggNOG. Additionally, we constructed pangenomes in a reference-free manner using high-quality metagenome-assembled genomes (MAGs) and analyzed the associated annotations. eggNOG annotated more genes on well-studied organisms, such as Escherichia coli, while DeepFRI was less sensitive to taxa. Further, we show that DeepFRI provides additional annotations in comparison to the previous DIABIMMUNE studies. This workflow will contribute to novel understanding of the functional signature of the human gut microbiome in health and disease as well as guiding future metagenomics studies. IMPORTANCE The past decade has seen advancement in high-throughput sequencing technologies resulting in rapid accumulation of genomic data from microbial communities. While this growth in sequence data and gene discovery is impressive, the majority of microbial gene functions remain uncharacterized. The coverage of functional information coming from either experimental sources or inferences is low. To solve these challenges, we have developed a new workflow to computationally assemble microbial genomes and annotate the genes using a deep learning-based model DeepFRI. This improved microbial gene annotation coverage to 1.9 million metagenome-assembled genes, representing 99% of the assembled genes, which is a significant improvement compared to 12% Gene Ontology term annotation coverage by commonly used orthology-based approaches. Importantly, the workflow supports pangenome reconstruction in a reference-free manner, allowing us to analyze the functional potential of individual bacterial species. We therefore propose this alternative approach combining deep-learning functional predictions with the commonly used orthology-based annotations as one that could help us uncover novel functions observed in metagenomic microbiome studies.


Subject(s)
Deep Learning , Microbiota , Humans , Metagenome/genetics , Molecular Sequence Annotation , Microbiota/genetics , Genome, Microbial
18.
Nat Commun ; 14(1): 2351, 2023 04 26.
Article in English | MEDLINE | ID: mdl-37100781

ABSTRACT

For the past half-century, structural biologists relied on the notion that similar protein sequences give rise to similar structures and functions. While this assumption has driven research to explore certain parts of the protein universe, it disregards spaces that don't rely on this assumption. Here we explore areas of the protein universe where similar protein functions can be achieved by different sequences and different structures. We predict ~200,000 structures for diverse protein sequences from 1,003 representative genomes across the microbial tree of life and annotate them functionally on a per-residue basis. Structure prediction is accomplished using the World Community Grid, a large-scale citizen science initiative. The resulting database of structural models is complementary to the AlphaFold database, with regards to domains of life as well as sequence diversity and sequence length. We identify 148 novel folds and describe examples where we map specific functions to structural motifs. We also show that the structural space is continuous and largely saturated, highlighting the need for a shift in focus across all branches of biology, from obtaining structures to putting them into context and from sequence-based to sequence-structure-function based meta-omics analyses.


Subject(s)
Protein Folding , Proteins , Proteins/metabolism , Amino Acid Sequence , Structure-Activity Relationship , Databases, Protein
19.
Methods Mol Biol ; 2627: 141-166, 2023.
Article in English | MEDLINE | ID: mdl-36959446

ABSTRACT

Structures of membrane proteins are challenging to determine experimentally and currently represent only about 2% of the structures in the Protein Data Bank. Because of this disparity, methods for modeling membrane proteins are fewer and of lower quality than those for modeling soluble proteins. However, better expression, crystallization, and cryo-EM techniques have prompted a recent increase in experimental structures of membrane proteins, which can act as templates to predict the structure of closely related proteins through homology modeling. Because homology modeling relies on a structural template, it is easier and more accurate than fold recognition methods or de novo modeling, which are used when the sequence similarity between the query sequence and the sequence of related proteins in structural databases is below 25%. In homology modeling, a query sequence is mapped onto the coordinates of a single template and refined. With the increase in available templates, several templates often cover overlapping segments of the query sequence. Multi-template modeling can be used to identify the best template for local segments and join them into a single model. Here we provide a protocol for modeling membrane proteins from multiple templates in the Rosetta software suite. This approach takes advantage of several integrated frameworks, namely, RosettaScripts, RosettaCM, and RosettaMP with the membrane scoring function.


Subject(s)
Membrane Proteins , Software , Membrane Proteins/chemistry , Molecular Dynamics Simulation , Models, Chemical , Protein Conformation , Structural Homology, Protein
20.
bioRxiv ; 2023 Feb 03.
Article in English | MEDLINE | ID: mdl-36778259

ABSTRACT

The modeling of gene regulatory networks (GRNs) is limited due to a lack of direct measurements of regulatory features in genome-wide screens. Most GRN inference methods are therefore forced to model relationships between regulatory genes and their targets with expression as a proxy for the upstream independent features, complicating validation and predictions produced by modeling frameworks. Separating covariance and regulatory influence requires aggregation of independent and complementary sets of evidence, such as transcription factor (TF) binding and target gene expression. However, the complete regulatory state of the system, e.g. TF activity (TFA) is unknown due to a lack of experimental feasibility, making regulatory relations difficult to infer. Some methods attempt to account for this by modeling TFA as a latent feature, but these models often use linear frameworks that are unable to account for non-linearities such as saturation, TF-TF interactions, and other higher order features. Deep learning frameworks may offer a solution, as they are capable of modeling complex interactions and capturing higher-order latent features. However, these methods often discard central concepts in biological systems modeling, such as sparsity and latent feature interpretability, in favor of increased model complexity. We propose a novel deep learning autoencoder-based framework, StrUcture Primed Inference of Regulation using latent Factor ACTivity (SupirFactor), that scales to single cell genomic data and maintains interpretability to perform GRN inference and estimate TFA as a latent feature. We demonstrate that SupirFactor outperforms current leading GRN inference methods, predicts biologically relevant TFA and elucidates functional regulatory pathways through aggregation of TFs.

SELECTION OF CITATIONS
SEARCH DETAIL
...